1 Introduction

In this evaluation, there are total 6 data tables. We used the evaluation metrics implemented in OmicsEV package to evaluate these data tables. The sample and class information for each data table are shown in the table below.

class d1 d2 d3 d4 d5 d6
Basal 17 17 17 17 17 17
Her2 12 12 12 12 12 12
LumA 19 19 19 19 19 19
LumB 22 22 22 22 22 22
None 16 16 16 16 16 16

The detailed sample information is shown below.

sample class batch order
TCGA.A2.A0CM Basal 1 1
TCGA.A2.A0D0 Basal 1 2
TCGA.A2.A0D1 None 1 3
TCGA.A2.A0D2 Basal 1 4
TCGA.A2.A0EQ Her2 1 5
TCGA.A2.A0EV LumA 1 6
TCGA.A2.A0EX LumA 1 7
TCGA.A2.A0EY LumB 1 8
TCGA.A2.A0SW LumB 1 9
TCGA.A2.A0SX Basal 1 10
TCGA.A2.A0T1 Her2 1 11
TCGA.A2.A0T2 Basal 1 12
TCGA.A2.A0T6 LumA 1 13
TCGA.A2.A0T7 LumA 1 14
TCGA.A2.A0YC LumA 1 15
TCGA.A2.A0YD LumA 1 16
TCGA.A2.A0YF LumA 1 17
TCGA.A2.A0YG LumB 1 18
TCGA.A2.A0YI LumA 1 19
TCGA.A2.A0YL LumA 1 20
TCGA.A2.A0YM Basal 1 21
TCGA.A7.A0CD LumA 1 22
TCGA.A7.A0CE Basal 1 23
TCGA.A7.A0CJ LumB 1 24
TCGA.A8.A06N LumB 1 25
TCGA.A8.A06Z LumB 1 26
TCGA.A8.A076 LumB 1 27
TCGA.A8.A079 LumB 1 28
TCGA.A8.A09G Her2 1 29
TCGA.A8.A09I LumB 1 30
TCGA.AN.A04A None 1 31
TCGA.AN.A0AJ LumB 1 32
TCGA.AN.A0AL Basal 1 33
TCGA.AN.A0AM LumB 1 34
TCGA.AN.A0AS LumA 1 35
TCGA.AN.A0FK LumA 1 36
TCGA.AN.A0FL Basal 1 37
TCGA.AO.A03O None 1 38
TCGA.AO.A0J6 None 1 39
TCGA.AO.A0J9 None 1 40
TCGA.AO.A0JC None 1 41
TCGA.AO.A0JE None 1 42
TCGA.AO.A0JJ None 1 43
TCGA.AO.A0JL None 1 44
TCGA.AO.A0JM None 1 45
TCGA.AO.A126 None 1 46
TCGA.AO.A12B None 1 47
TCGA.AO.A12E None 1 48
TCGA.AR.A0TR LumA 1 49
TCGA.AR.A0TT LumB 1 50
TCGA.AR.A0TV LumB 1 51
TCGA.AR.A0TX Her2 1 52
TCGA.AR.A0U4 None 1 53
TCGA.BH.A0EE Her2 1 54
TCGA.BH.A0HP LumA 1 55
TCGA.A2.A0T3 LumB 2 56
TCGA.A7.A13F LumB 2 57
TCGA.AO.A12D None 2 58
TCGA.AO.A12F None 2 59
TCGA.AR.A0TY LumB 2 60
TCGA.AR.A1AQ Basal 2 61
TCGA.AR.A1AV LumA 2 62
TCGA.AR.A1AW LumB 2 63
TCGA.BH.A0AV Basal 2 64
TCGA.BH.A0C1 LumA 2 65
TCGA.BH.A0C7 LumB 2 66
TCGA.BH.A0E9 LumA 2 67
TCGA.C8.A12L Her2 2 68
TCGA.C8.A12P Her2 2 69
TCGA.C8.A12Q Her2 2 70
TCGA.C8.A12T Her2 2 71
TCGA.C8.A12U LumB 2 72
TCGA.C8.A12V Basal 2 73
TCGA.C8.A12W LumB 2 74
TCGA.C8.A12Z Her2 2 75
TCGA.C8.A130 LumB 2 76
TCGA.C8.A131 Basal 2 77
TCGA.C8.A134 Basal 2 78
TCGA.C8.A135 Her2 2 79
TCGA.C8.A138 Her2 2 80
TCGA.D8.A13Y LumB 2 81
TCGA.D8.A142 Basal 2 82
TCGA.E2.A10A LumA 2 83
TCGA.E2.A150 Basal 2 84
TCGA.E2.A154 LumA 2 85
TCGA.E2.A159 Basal 2 86

2 Overview

The table below provides an overview about all the quantitative metrics generated in the evaluation. For each metric, the value of the best data table is highlighted. The detail of each metric can be found in corresponding section below.

metric d1 d2 d3 d4 d5 d6
#identified features 18845
(0.9244)
18845
(0.9244)
18845
(0.9244)
18845
(0.9244)
18845
(0.9244)
18845
(0.9244)
#quantifiable features 17416
(0.8543)
17416
(0.8543)
17416
(0.8543)
17416
(0.8543)
17416
(0.8543)
17416
(0.8543)
non_missing_value_ratio 0.9780 0.9780 0.9780 0.9780 0.9780 0.9780
data_dist_similarity 0.9739 1.0000 0.9864 0.9863 0.9621 0.9766
silhouette_width 0.0145
(0.9855)
-0.0009
(0.9991)
0.0139
(0.9861)
0.0144
(0.9856)
0.0200
(0.9800)
0.0214
(0.9786)
pcRegscale 0.1682
(0.8318)
0.2771
(0.7229)
0.1720
(0.8280)
0.1731
(0.8269)
0.0954
(0.9046)
0.0000
(1.0000)
complex_auc 0.6520 0.6340 0.6815 0.6819 0.6320 0.6536
func_auc 0.7758 0.7740 0.7871 0.7984 0.8072 0.7918
class_auc 0.9900 0.9943 0.9887 0.9898 0.9913 0.9940
gene_wise_cor 0.3294 0.3354 0.3374 0.3384 0.3207 0.3283
sample_wise_cor 0.1421 0.1421 0.1421 0.1421 0.1421 0.1383

The radar plot showing below is generated based on the data in the above overview table. To generate the radar plot, a metric is converted to a scale in which the value range is between 0 and 1 in a way that higher value indicates better data quality if necessary. The converted values are in parentheses.

3 Data depth

3.1 Study-wise

The table below shows the number of identified proteins or genes for each data table. We take the proteins or genes filtered by 50% missing value as quantified proteins or genes. The values in parentheses are the percentage of proteins or genes identified or quantified based on the total number of proteins or genes (20386) in the study species.

data table #identified features #quantifiable features
d1 18845
(92.44%)
17416
(85.43%)
d2 18845
(92.44%)
17416
(85.43%)
d3 18845
(92.44%)
17416
(85.43%)
d4 18845
(92.44%)
17416
(85.43%)
d5 18845
(92.44%)
17416
(85.43%)
d6 18845
(92.44%)
17416
(85.43%)

Upset chart below showing overlap in proteins or genes identified in each data table. Numbers of identified proteins or genes shared between different data tables are indicated in the top bar chart and the specific data tables in each set are indicated with solid points below the bar chart. Total identifications for each data table are indicated on the left as ‘Set size’.

3.2 Sample-wise

The figures below show the number of proteins or genes identified in each sample. Only when the quantification value of a gene or protein is not “NA” in a sample, this gene or protein is considered as identified in the sample. The samples from different batches are coded in different shapes and the samples from different classes are coded in different colors.

d1d2d3d4d5d6

3.3 Missing value distribution

The missing value distribution can give an overview of the percent of missing values of all proteins or genes in both the QC and experiment samples.

data table non_missing_value_ratio
d1 0.978
d2 0.978
d3 0.978
d4 0.978
d5 0.978
d6 0.978

d1d2d3d4d5d6

4 Data normalization

4.1 Boxplot

The boxplots show the protein or gene expression distribution across samples. X axis is sample ordered by input order. Y axis is log2 transformed protein or gene expression. The samples from different classes are coded in different colors.

d1d2d3d4d5d6

To quantify the normalization effect, for each pair of samples, perform an AUROC test to quantify the ability of feature abundance to distinguish the two samples and then generate a score based on 1-2*abs(AUROC-0.5), which will be 0 to 1, higher the better (no systematic difference between the two samples). The final metric for each data table is the median of scores from all sample pairs.

data table data_dist_similarity n
d1 0.9739 3655
d2 1.0000 3655
d3 0.9864 3655
d4 0.9863 3655
d5 0.9621 3655
d6 0.9766 3655

4.2 Density plot

The density plots show the protein or gene expression distribution across samples. X axis is log2 transformed protein or gene expression. Y axis is density.

5 Batch effect

5.1 Silhouette width

The silhouette width s(i) ranges from –1 to 1, with s(i) -> 1 if two clusters are separate and s(i) -> −1 if two clusters overlap but have dissimilar variance. If s(i) -> 0, both clusters have roughly the same structure. Thus, we use the absolute value |s| as an indicator for the presence or absence of batch effects.

data table silhouette_width
d1 0.0145
d2 -0.0009
d3 0.0139
d4 0.0144
d5 0.0200
d6 0.0214

5.2 PCA with batch annotation

For each PC, we calculate Pearson’s correlation coefficient with batch covariate b:

ri =corr(PCi,b)

In a linear model with a single dependent, as is the case here for the PCs correlated to batch covariate, the coefficient of determination R2 is the squared Pearson’s correlation coefficient:

R2(PCi,b) = ri2

Then we estimate the significance of the correlation coefficient either with a t-test or a one-way ANOVA. The R2 value highlighted with red is significant (p-value <= 0.05).

PC d1 d2 d3 d4 d5 d6
1 0.007 0.012 0.01 0.01 0 0.008
2 0.015 0 0.001 0.001 0.041 0.001
3 0.048 0.092 0.013 0.013 0.027 0.006
4 0.1 0.118 0.11 0.11 0.116 0.026
5 0 0.002 0.002 0.002 0.003 0.001
6 0.093 0.007 0.121 0.119 0.076 0.005
7 0.006 0.038 0.005 0.003 0.003 0.006
8 0.01 0.228 0.035 0.038 0.011 0
9 0.033 0.028 0.029 0.027 0.025 0.005
10 0.001 0.002 0.004 0.003 0.015 0.001

The fraction of variance explained for each PC:

PC d1 d2 d3 d4 d5 d6
1 10.8 10.8 10.9 11.0 11.2 11.0
2 7.9 6.7 7.3 7.3 9.1 7.8
3 5.4 5.1 4.9 4.9 5.7 5.3
4 4.5 4.7 4.6 4.6 4.4 4.3
5 4.1 4.0 4.0 4.0 3.9 4.2
6 3.0 2.9 2.9 3.0 3.1 2.9
7 2.8 2.7 2.8 2.9 2.7 2.8
8 2.2 2.2 2.2 2.2 2.1 2.2
9 2.0 2.1 2.1 2.1 2.0 2.0
10 1.9 2.1 1.9 1.9 1.9 1.9

‘Scaled PC regression’, i.e. total variance of PCs which correlate significantly with batch covariate (FDR<0.05) scaled by the total variance of 10 PCs:

data table pcRegscale
d1 0.1682
d2 0.2771
d3 0.1720
d4 0.1731
d5 0.0954
d6 0.0000

In these figures, each column is a sample, each row is also a sample. The color indicates the correlation between samples. The samples are ordered by batches.

5.3 Correlation heatmap

In these figures, each column is a sample, each row is also a sample. The color indicates the correlation between samples. The samples are ordered by batches.

d1d2d3d4d5d6

6 Biological signal

6.1 Correlation among protein complex members

The table showing below is a summary of the evaluation. ‘diff’ is Cor(intra) - Cor(inter). ‘complex_auc’ is the AUROC value based on correlation of protein pairs from different groups.

data table InterComplex IntraComplex diff complex_auc
d1 0.0413 0.1634 0.1220 0.6520
d2 0.0050 0.1064 0.1013 0.6340
d3 0.0219 0.1715 0.1496 0.6815
d4 0.0201 0.1697 0.1496 0.6819
d5 0.0712 0.1825 0.1112 0.6320
d6 0.0410 0.1621 0.1210 0.6536
Protein 0.0047 0.1722 0.1676 0.6426

6.2 Gene function prediction

In this evaluation, each data table was used to build co-expression network. For a selected network and a selected function term (such as GO or KEGG), proteins/genes annotated to the term and also included in the network were defined as a positive protein/gene set and other proteins/genes in the network constituted the negative protein/gene set for the term. For a selected function term, we use some of the proteins/genes as the seed protein/gene, then we use random walk algorithm to calculate scores for other proteins/genes. A higher score of a protein/gene represents a closer relationship between the protein/gene and the seed proteins/genes. Finally, for each selected function term, we calculate an AUROC to evaluate the prediction performance.

d1 d2 d3 d4 d5 d6 Protein
Acute myeloid leukemia 0.543 0.587 0.641 0.595 0.546 0.543 0.588
Adherens junction 0.619 0.627 0.584 0.542 0.615 0.551 0.606
Adipocytokine signaling pathway 0.651 0.577 0.584 0.582 0.595 0.62 0.62
Alanine, aspartate and glutamate metabolism 0.651 0.58 0.573 0.6 0.702 0.658 0.679
Aldosterone-regulated sodium reabsorption 0.658 0.683 0.535 0.628 0.622 0.606 0.646
Allograft rejection 1 1 1 1 1 1 0.916
Alzheimers disease 0.708 0.646 0.699 0.72 0.724 0.684 0.733
Amino sugar and nucleotide sugar metabolism 0.645 0.641 0.604 0.601 0.66 0.608 0.638
Aminoacyl-tRNA biosynthesis 0.75 0.749 0.778 0.766 0.687 0.739 0.627
Amoebiasis 0.598 0.635 0.598 0.591 0.629 0.597 0.779
Amyotrophic lateral sclerosis (ALS) 0.661 0.607 0.7 0.652 0.648 0.609 0.58
Antigen processing and presentation 0.858 0.803 0.86 0.865 0.773 0.782 0.573
Apoptosis 0.615 0.596 0.584 0.563 0.559 0.63 0.563
Arachidonic acid metabolism 0.563 0.589 0.635 0.662 0.714 0.677 0.58
Arginine and proline metabolism 0.641 0.67 0.586 0.6 0.587 0.606 0.611
Arrhythmogenic right ventricular cardiomyopathy (ARVC) 0.637 0.63 0.638 0.592 0.625 0.676 0.693
Autoimmune thyroid disease 1 0.999 1 1 1 1 0.904
Axon guidance 0.667 0.629 0.604 0.59 0.633 0.615 0.56
B cell receptor signaling pathway 0.599 0.58 0.522 0.568 0.526 0.624 0.674
Bacterial invasion of epithelial cells 0.615 0.556 0.548 0.556 0.564 0.574 0.767
Basal transcription factors 0.627 0.487 0.519 0.522 0.588 0.556 0.669
Base excision repair 0.668 0.619 0.612 0.625 0.691 0.721 0.757
beta-Alanine metabolism 0.591 0.586 0.687 0.587 0.682 0.658 0.69
Bile secretion 0.609 0.593 0.594 0.602 0.684 0.587 0.646
Biosynthesis of unsaturated fatty acids 0.679 0.679 0.552 0.735 0.651 0.6 0.83
Bladder cancer 0.621 0.635 0.57 0.542 0.61 0.633 0.621
Butanoate metabolism 0.722 0.626 0.619 0.647 0.648 0.624 0.736
Calcium signaling pathway 0.609 0.535 0.585 0.613 0.606 0.6 0.61
Carbohydrate digestion and absorption 0.637 0.683 0.686 0.661 0.689 0.617 0.761
Cardiac muscle contraction 0.704 0.654 0.688 0.703 0.683 0.72 0.878
Cell adhesion molecules (CAMs) 0.796 0.786 0.788 0.815 0.812 0.795 0.77
Cell cycle 0.737 0.728 0.742 0.729 0.692 0.732 0.747
Chagas disease (American trypanosomiasis) 0.594 0.6 0.63 0.571 0.542 0.566 0.581
Chemokine signaling pathway 0.576 0.561 0.564 0.561 0.61 0.535 0.715
Citrate cycle (TCA cycle) 0.637 0.768 0.769 0.739 0.643 0.658 0.866
Collecting duct acid secretion 0.688 0.733 0.674 0.658 0.626 0.688 0.722
Colorectal cancer 0.547 0.528 0.576 0.573 0.589 0.596 0.646
Complement and coagulation cascades 0.867 0.842 0.78 0.867 0.838 0.874 0.907
Cysteine and methionine metabolism 0.61 0.615 0.593 0.653 0.705 0.712 0.622
Cytokine-cytokine receptor interaction 0.701 0.626 0.772 0.756 0.776 0.739 0.639
Cytosolic DNA-sensing pathway 0.65 0.617 0.687 0.727 0.673 0.702 0.583
Dilated cardiomyopathy 0.613 0.592 0.625 0.605 0.677 0.623 0.699
DNA replication 0.723 0.758 0.787 0.775 0.843 0.8 0.866
Drug metabolism - cytochrome P450 0.762 0.747 0.68 0.743 0.776 0.731 0.658
Drug metabolism - other enzymes 0.63 0.67 0.616 0.672 0.631 0.642 0.662
ECM-receptor interaction 0.857 0.851 0.82 0.816 0.883 0.866 0.916
Endocytosis 0.564 0.563 0.568 0.585 0.602 0.59 0.601
Endometrial cancer 0.593 0.539 0.596 0.551 0.57 0.531 0.631
Epithelial cell signaling in Helicobacter pylori infection 0.549 0.565 0.562 0.541 0.568 0.533 0.706
ErbB signaling pathway 0.579 0.6 0.62 0.591 0.576 0.553 0.535
Ether lipid metabolism 0.685 0.7 0.675 0.66 0.806 0.71 0.61
Fatty acid elongation in mitochondria 0.69 0.535 0.701 0.671 0.68 0.633 0.637
Fatty acid metabolism 0.652 0.642 0.6 0.628 0.684 0.543 0.717
Fc epsilon RI signaling pathway 0.555 0.564 0.636 0.533 0.567 0.616 0.68
Fc gamma R-mediated phagocytosis 0.584 0.671 0.65 0.659 0.61 0.638 0.74
Focal adhesion 0.669 0.632 0.666 0.671 0.672 0.686 0.75
Fructose and mannose metabolism 0.641 0.612 0.582 0.659 0.622 0.652 0.651
Galactose metabolism 0.623 0.605 0.642 0.599 0.639 0.554 0.641
Gap junction 0.595 0.584 0.565 0.619 0.551 0.622 0.706
Gastric acid secretion 0.596 0.686 0.572 0.635 0.608 0.579 0.618
Glioma 0.587 0.629 0.559 0.601 0.599 0.624 0.536
Glutathione metabolism 0.614 0.641 0.648 0.608 0.632 0.638 0.637
Glycerolipid metabolism 0.593 0.546 0.545 0.605 0.572 0.653 0.637
Glycerophospholipid metabolism 0.596 0.554 0.592 0.562 0.636 0.617 0.619
Glycine, serine and threonine metabolism 0.551 0.563 0.63 0.628 0.688 0.739 0.583
Glycolysis / Gluconeogenesis 0.602 0.565 0.622 0.637 0.595 0.624 0.743
Glyoxylate and dicarboxylate metabolism 0.797 0.677 0.648 0.698 0.694 0.641 0.677
GnRH signaling pathway 0.628 0.685 0.705 0.671 0.678 0.669 0.578
Graft-versus-host disease 1 1 1 1 1 1 0.885
Hematopoietic cell lineage 0.741 0.766 0.809 0.759 0.791 0.717 0.727
Hepatitis C 0.534 0.553 0.565 0.583 0.62 0.642 0.612
Huntingtons disease 0.717 0.67 0.72 0.745 0.741 0.717 0.796
Hypertrophic cardiomyopathy (HCM) 0.62 0.598 0.625 0.574 0.691 0.644 0.657
Inositol phosphate metabolism 0.587 0.633 0.543 0.632 0.573 0.566 0.578
Intestinal immune network for IgA production 0.759 0.764 0.906 0.902 0.898 0.813 0.79
Jak-STAT signaling pathway 0.681 0.703 0.617 0.641 0.674 0.619 0.577
Leishmaniasis 0.69 0.696 0.676 0.685 0.708 0.68 0.733
Leukocyte transendothelial migration 0.603 0.628 0.64 0.656 0.622 0.576 0.794
Long-term depression 0.553 0.553 0.586 0.567 0.727 0.616 0.539
Long-term potentiation 0.55 0.607 0.559 0.561 0.598 0.663 0.569
Lysine degradation 0.73 0.646 0.607 0.626 0.587 0.641 0.619
Lysosome 0.539 0.541 0.585 0.567 0.528 0.516 0.637
Malaria 0.769 0.73 0.774 0.798 0.822 0.806 0.825
Melanogenesis 0.605 0.643 0.674 0.686 0.607 0.608 0.59
Melanoma 0.53 0.533 0.596 0.599 0.547 0.642 0.643
Metabolic pathways 0.589 0.592 0.602 0.593 0.603 0.598 0.627
Metabolism of xenobiotics by cytochrome P450 0.754 0.731 0.787 0.79 0.81 0.758 0.682
Mismatch repair 0.73 0.721 0.755 0.741 0.748 0.746 0.862
mRNA surveillance pathway 0.566 0.56 0.511 0.567 0.636 0.576 0.75
mTOR signaling pathway 0.626 0.577 0.611 0.58 0.596 0.578 0.705
N-Glycan biosynthesis 0.749 0.678 0.761 0.797 0.772 0.62 0.702
Natural killer cell mediated cytotoxicity 0.607 0.53 0.644 0.573 0.563 0.577 0.788
NOD-like receptor signaling pathway 0.603 0.562 0.624 0.648 0.553 0.658 0.603
Non-small cell lung cancer 0.613 0.574 0.522 0.55 0.55 0.568 0.6
Notch signaling pathway 0.578 0.594 0.659 0.55 0.554 0.608 0.749
Nucleotide excision repair 0.61 0.589 0.588 0.665 0.689 0.665 0.743
Oocyte meiosis 0.628 0.613 0.606 0.643 0.622 0.656 0.537
Osteoclast differentiation 0.646 0.675 0.625 0.573 0.652 0.588 0.688
Oxidative phosphorylation 0.79 0.696 0.78 0.799 0.8 0.788 0.877
p53 signaling pathway 0.532 0.638 0.636 0.59 0.617 0.612 0.575
Pancreatic cancer 0.611 0.543 0.594 0.612 0.624 0.619 0.544
Pancreatic secretion 0.607 0.574 0.604 0.6 0.575 0.573 0.629
Parkinsons disease 0.782 0.712 0.781 0.775 0.771 0.784 0.883
Pathogenic Escherichia coli infection 0.602 0.647 0.658 0.62 0.598 0.566 0.686
Pentose phosphate pathway 0.64 0.669 0.682 0.623 0.628 0.728 0.705
Peroxisome 0.546 0.559 0.659 0.611 0.703 0.534 0.598
Phagosome 0.7 0.703 0.699 0.712 0.77 0.703 0.713
PPAR signaling pathway 0.558 0.651 0.629 0.611 0.618 0.572 0.572
Prion diseases 0.687 0.641 0.637 0.67 0.63 0.67 0.76
Progesterone-mediated oocyte maturation 0.611 0.635 0.629 0.582 0.614 0.61 0.637
Propanoate metabolism 0.782 0.626 0.702 0.642 0.629 0.753 0.681
Prostate cancer 0.511 0.587 0.582 0.584 0.532 0.571 0.614
Proteasome 0.86 0.794 0.86 0.848 0.816 0.857 0.808
Protein digestion and absorption 0.89 0.897 0.886 0.88 0.867 0.847 0.868
Protein processing in endoplasmic reticulum 0.722 0.698 0.742 0.742 0.727 0.722 0.633
Purine metabolism 0.607 0.594 0.575 0.621 0.549 0.574 0.619
Pyrimidine metabolism 0.545 0.534 0.552 0.599 0.61 0.571 0.633
Pyruvate metabolism 0.643 0.584 0.643 0.609 0.611 0.722 0.603
Regulation of actin cytoskeleton 0.636 0.616 0.63 0.621 0.621 0.618 0.652
Retinol metabolism 0.768 0.733 0.727 0.631 0.617 0.685 0.756
Rheumatoid arthritis 0.698 0.639 0.72 0.699 0.71 0.717 0.663
Ribosome 0.927 0.835 0.891 0.898 0.91 0.914 0.924
Ribosome biogenesis in eukaryotes 0.682 0.722 0.753 0.737 0.649 0.744 0.805
RIG-I-like receptor signaling pathway 0.562 0.475 0.625 0.598 0.652 0.65 0.627
RNA degradation 0.611 0.628 0.597 0.626 0.665 0.658 0.789
RNA polymerase 0.683 0.685 0.634 0.605 0.607 0.701 0.8
RNA transport 0.669 0.666 0.669 0.668 0.682 0.679 0.752
Salivary secretion 0.658 0.568 0.606 0.524 0.659 0.623 0.668
Shigellosis 0.574 0.596 0.589 0.592 0.639 0.545 0.673
Small cell lung cancer 0.541 0.6 0.556 0.549 0.574 0.606 0.67
SNARE interactions in vesicular transport 0.697 0.669 0.606 0.684 0.651 0.655 0.773
Sphingolipid metabolism 0.564 0.634 0.686 0.601 0.661 0.67 0.629
Spliceosome 0.749 0.743 0.76 0.78 0.733 0.754 0.763
Staphylococcus aureus infection 0.889 0.878 0.896 0.894 0.849 0.9 0.839
Starch and sucrose metabolism 0.643 0.656 0.676 0.692 0.634 0.602 0.65
Systemic lupus erythematosus 0.812 0.78 0.832 0.816 0.803 0.803 0.84
T cell receptor signaling pathway 0.542 0.582 0.55 0.57 0.577 0.594 0.606
Terpenoid backbone biosynthesis 0.64 0.672 0.734 0.73 0.655 0.69 0.84
TGF-beta signaling pathway 0.619 0.617 0.591 0.637 0.601 0.64 0.587
Tight junction 0.578 0.583 0.572 0.601 0.612 0.623 0.538
Toll-like receptor signaling pathway 0.585 0.575 0.58 0.529 0.596 0.561 0.665
Toxoplasmosis 0.581 0.594 0.547 0.603 0.565 0.599 0.626
Tryptophan metabolism 0.591 0.704 0.696 0.684 0.685 0.589 0.646
Type I diabetes mellitus 0.868 0.905 0.871 0.913 0.849 0.917 0.729
Tyrosine metabolism 0.762 0.768 0.748 0.742 0.717 0.711 0.61
Ubiquitin mediated proteolysis 0.691 0.664 0.634 0.676 0.649 0.622 0.608
Valine, leucine and isoleucine degradation 0.671 0.543 0.766 0.699 0.689 0.69 0.772
Vascular smooth muscle contraction 0.53 0.546 0.599 0.578 0.624 0.572 0.613
Vasopressin-regulated water reabsorption 0.639 0.681 0.529 0.605 0.613 0.623 0.634
VEGF signaling pathway 0.591 0.562 0.592 0.534 0.541 0.504 0.656
Vibrio cholerae infection 0.609 0.516 0.629 0.565 0.586 0.623 0.802
Viral myocarditis 0.698 0.71 0.742 0.764 0.747 0.745 0.673
Wnt signaling pathway 0.665 0.626 0.576 0.59 0.598 0.714 0.609

d1d2d3d4d5d6

6.3 Sample class prediction

For each data table, machine learning models are built to predict sample class:LumA,LumB. In OmicsEV, random forest models are built and the models are evaluated using repeated 5 fold cross validation (20 times).

dataSet mean_ROC median_ROC sd_ROC
d1 0.9900 0.9910 0.0054
d2 0.9943 0.9952 0.0029
d3 0.9887 0.9880 0.0035
d4 0.9898 0.9904 0.0031
d5 0.9913 0.9922 0.0033
d6 0.9940 0.9952 0.0026
Protein 0.7949 0.7972 0.0136

6.4 PCA with sample class annotation

d1d2d3d4d5d6

6.5 Unsupervised clustering

d1d2d3d4d5d6

7 Multi-omics concordance

7.1 Gene-wise mRNA-protein correlation

data table n n5 n6 n7 n8 gene_wise_cor
d1 9120 1911 773 210 21 0.3294
d2 9120 1970 827 222 23 0.3354
d3 9120 2010 827 227 25 0.3374
d4 9120 2016 837 224 27 0.3384
d5 9120 1763 696 185 20 0.3207
d6 9120 1932 763 205 20 0.3283

d1d2d3d4d5d6

7.2 Sample-wise mRNA-protein correlation

data table sample_wise_cor
d1 0.1421
d2 0.1421
d3 0.1421
d4 0.1421
d5 0.1421
d6 0.1383